Cost and Benefit of Using WordNet Senses for Sentiment Analysis
نویسندگان
چکیده
Typically, accuracy is used to represent the performance of an NLP system. However, accuracy attainment is a function of investment in annotation. Typically, the more the amount and sophistication of annotation, higher is the accuracy. However, a moot question is is the accuracy improvement commensurate with the cost incurred in annotation? We present an economic model to assess the marginal benefit accruing from increase in cost of annotation. In particular, as a case in point we have chosen the Sentiment Analysis (SA) problem. In SA, documents normally are polarity classified by running them through classifiers trained on document vectors constructed from lexeme features, i.e., words. If, however, instead of words, one uses word senses (synset ids in wordnets) as features, the accuracy improves dramatically. But is this improvement significant enough to justify the cost of annotation? This question, to the best of our knowledge, has not been investigated with the seriousness it deserves. We perform a cost benefit study based on a vendor-machine model. By setting up a cost price, selling price and profit scenario, we show that although extra cost is incurred in sense annotation, the profit margin is high, justifying the cost. Additionally we show that a system that uses sense annotation achieves a break even point earlier than the system that does not use sense annotation. Our study is timely in the current NLP scenario of ML applied to richly annotated data, frequently obtained through crowd-sourcing which involves monetary investment.
منابع مشابه
Harnessing WordNet Senses for Supervised Sentiment Classification
Traditional approaches to sentiment classification rely on lexical features, syntax-based features or a combination of the two. We propose semantic features using word senses for a supervised document-level sentiment classifier. To highlight the benefit of sense-based features, we compare word-based representation of documents with a sense-based representation where WordNet senses of the words ...
متن کاملQ-WordNet: Extracting Polarity from WordNet Senses
This paper presents Q-WordNet, a lexical resource consisting of WordNet senses automatically annotated by positive and negative polarity. Polarity classification amounts to decide whether a text (sense, sentence, etc.) may be associated to positive or negative connotations. Polarity classification is becoming important for applications such as Opinion Mining and Sentiment Analysis, which facili...
متن کاملRobust Sense-based Sentiment Classification
The new trend in sentiment classification is to use semantic features for representation of documents. We propose a semantic space based on WordNet senses for a supervised document-level sentiment classifier. Not only does this show a better performance for sentiment classification, it also opens opportunities for building a robust sentiment classifier. We examine the possibility of using simil...
متن کاملSemantic Tagging at the Sense Level
This paper summarizes our research in the area of semantic tagging at the word and sense levels and sets the ground for a new approach to text-level sentiment annotation using a combination of machine learning and linguisticallymotivated techniques. We describe a system for sentiment tagging of words and senses based on WordNet glosses and advance the treatment of sentiment as a fuzzy category.
متن کامل2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework
Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...
متن کامل